Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Noise adaptive speech recognition based on sequential noise parameter estimation

Identifieur interne : 00AB73 ( Main/Exploration ); précédent : 00AB72; suivant : 00AB74

Noise adaptive speech recognition based on sequential noise parameter estimation

Auteurs : KAISHENG YAO [Japon] ; Kuldip K. Paliwal [Japon, Australie] ; Satoshi Nakamura [Japon]

Source :

RBID : Pascal:04-0276268

Descripteurs français

English descriptors

Abstract

In this paper, a noise adaptive speech recognition approach is proposed for recognizing speech which is corrupted by additive non-stationary background noise. The approach sequentially estimates noise parameters, through which a non-linear parametric function adapts mean vectors of acoustic models. In the estimation process, posterior probability of state sequence given observation sequence and the previously estimated noise parameter sequence is approximated by the normalized joint likelihood of active partial paths and observation sequence given the previously estimated noise parameter sequence. The Viterbi process provides the normalized joint-likelihood. The acoustic models are not required to be trained from clean speech and they can be trained from noisy speech. The approach can be applied to perform continuous speech recognition in presence of non-stationary noise. Experiments conducted on speech contaminated by simulated and real non-stationary noise show that when acoustic models are trained from clean speech, the noise adaptive speech recognition system provides improvements in word accuracy as compared to the normal noise compensation system (which assumes the noise to be stationary) in slowly time-varying noise. When the acoustic models are trained from noisy speech, the noise adaptive speech recognition system is found to be helpful to get improved performance in slowly time-varying noise over a system employing multi-conditional training.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Noise adaptive speech recognition based on sequential noise parameter estimation</title>
<author>
<name sortKey="Kaisheng Yao" sort="Kaisheng Yao" uniqKey="Kaisheng Yao" last="Kaisheng Yao">KAISHENG YAO</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Paliwal, Kuldip K" sort="Paliwal, Kuldip K" uniqKey="Paliwal K" first="Kuldip K." last="Paliwal">Kuldip K. Paliwal</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>School of Microelectronic Engineering, Griffith University</s1>
<s2>Brisbane</s2>
<s3>AUS</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Australie</country>
<wicri:noRegion>Brisbane</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Nakamura, Satoshi" sort="Nakamura, Satoshi" uniqKey="Nakamura S" first="Satoshi" last="Nakamura">Satoshi Nakamura</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">04-0276268</idno>
<date when="2004">2004</date>
<idno type="stanalyst">PASCAL 04-0276268 INIST</idno>
<idno type="RBID">Pascal:04-0276268</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">004F03</idno>
<idno type="wicri:Area/PascalFrancis/Curation">001211</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">004A34</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">004A34</idno>
<idno type="wicri:doubleKey">0167-6393:2004:Kaisheng Yao:noise:adaptive:speech</idno>
<idno type="wicri:Area/Main/Merge">00B861</idno>
<idno type="wicri:Area/Main/Curation">00AB73</idno>
<idno type="wicri:Area/Main/Exploration">00AB73</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Noise adaptive speech recognition based on sequential noise parameter estimation</title>
<author>
<name sortKey="Kaisheng Yao" sort="Kaisheng Yao" uniqKey="Kaisheng Yao" last="Kaisheng Yao">KAISHENG YAO</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Paliwal, Kuldip K" sort="Paliwal, Kuldip K" uniqKey="Paliwal K" first="Kuldip K." last="Paliwal">Kuldip K. Paliwal</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>School of Microelectronic Engineering, Griffith University</s1>
<s2>Brisbane</s2>
<s3>AUS</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Australie</country>
<wicri:noRegion>Brisbane</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Nakamura, Satoshi" sort="Nakamura, Satoshi" uniqKey="Nakamura S" first="Satoshi" last="Nakamura">Satoshi Nakamura</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Speech communication</title>
<title level="j" type="abbreviated">Speech commun.</title>
<idno type="ISSN">0167-6393</idno>
<imprint>
<date when="2004">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Speech communication</title>
<title level="j" type="abbreviated">Speech commun.</title>
<idno type="ISSN">0167-6393</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Additive noise</term>
<term>EM algorithm</term>
<term>Noise reduction</term>
<term>Non stationary process</term>
<term>Parameter estimation</term>
<term>Sequential estimation</term>
<term>Speech recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance parole</term>
<term>Estimation paramètre</term>
<term>Estimation séquentielle</term>
<term>Réduction bruit</term>
<term>Algorithme EM</term>
<term>Processus non stationnaire</term>
<term>Bruit additif</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper, a noise adaptive speech recognition approach is proposed for recognizing speech which is corrupted by additive non-stationary background noise. The approach sequentially estimates noise parameters, through which a non-linear parametric function adapts mean vectors of acoustic models. In the estimation process, posterior probability of state sequence given observation sequence and the previously estimated noise parameter sequence is approximated by the normalized joint likelihood of active partial paths and observation sequence given the previously estimated noise parameter sequence. The Viterbi process provides the normalized joint-likelihood. The acoustic models are not required to be trained from clean speech and they can be trained from noisy speech. The approach can be applied to perform continuous speech recognition in presence of non-stationary noise. Experiments conducted on speech contaminated by simulated and real non-stationary noise show that when acoustic models are trained from clean speech, the noise adaptive speech recognition system provides improvements in word accuracy as compared to the normal noise compensation system (which assumes the noise to be stationary) in slowly time-varying noise. When the acoustic models are trained from noisy speech, the noise adaptive speech recognition system is found to be helpful to get improved performance in slowly time-varying noise over a system employing multi-conditional training.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Australie</li>
<li>Japon</li>
</country>
</list>
<tree>
<country name="Japon">
<noRegion>
<name sortKey="Kaisheng Yao" sort="Kaisheng Yao" uniqKey="Kaisheng Yao" last="Kaisheng Yao">KAISHENG YAO</name>
</noRegion>
<name sortKey="Nakamura, Satoshi" sort="Nakamura, Satoshi" uniqKey="Nakamura S" first="Satoshi" last="Nakamura">Satoshi Nakamura</name>
<name sortKey="Paliwal, Kuldip K" sort="Paliwal, Kuldip K" uniqKey="Paliwal K" first="Kuldip K." last="Paliwal">Kuldip K. Paliwal</name>
</country>
<country name="Australie">
<noRegion>
<name sortKey="Paliwal, Kuldip K" sort="Paliwal, Kuldip K" uniqKey="Paliwal K" first="Kuldip K." last="Paliwal">Kuldip K. Paliwal</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 00AB73 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 00AB73 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:04-0276268
   |texte=   Noise adaptive speech recognition based on sequential noise parameter estimation
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024